
\section{Introduction}

Decision making, planning and learning in Multi-agent systems (henceforth, MAS) are challenging and important problems with many applications in the areas of robotics, telecommunications, economics, distributed control, auctions, traffic light control and games.
Classical algorithms designed for planning and learning in Markov Decision Processes (MDPs) such as value iteration \cite{Bel} and Q-learning \cite{QLEARNING} often do not suffice in multi-agent settings, because the presence of other learning agents makes the environment appear non-stationary from the perspective of a particular agent, so that these algorithms may fail to converge \cite{GTandMARL}.
Furthermore, in cooperative MAS coordination may be required to choose among optimal joint actions, whereas in competitive settings it may be difficult or even impossible to define a sound optimality criterion.

The domination game (DG) was explicitly designed as a test bed for multi-agent planning and learning algorithms, and is at the high-end of complexity on several dimensions:
\begin{enumerate}[I]\itemsep2pt
 \item It has both cooperative and competitive elements.
 \item The state-action space is large and continuous.
 \item The environment is partially observable for each agent, and even the joint observation does not disambiguate the full state.
 \item Running a game is computationally expensive, so learning is slow and learning algorithms must have low sample complexity.
\end{enumerate}

More specifically, the game is played by two teams of 6 agents that aim to maintain control over so called ``Control Points'' (CPs) in order to get points.
Agents can collect ammo that spawns in a fixed location on each map, and use it to shoot other agents.
The maps are generated randomly before each game.
In the rest of this article we will assume the reader is familiar with the DG.
Further details about game dynamics can be found in the DG documentation \cite{DGDocs}.

The rest of the article is structured as follows.
In section \ref{sec:related_work}, we discuss previous work and its relation to our present work.
In section \ref{sec:methodology}, we describe our techniques and discuss the considerations that were entertained in the decision making process.
Next, we describe our experimental setup and discuss the results in section \ref{sec:experiments}, which we analyze in section \ref{sec:analysis}.
We conclude our article and give some pointers for future work in section \ref{sec:conclusion}.
